import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
coffee = pd.read_csv("df_arabica_clean_.csv", index_col=0)
coffee.head(3)
ID | Country of Origin | Farm Name | Lot Number | Mill | ICO Number | Company | Altitude | Region | Producer | ... | Defects | Total Cup Points | Moisture Percentage | Category One Defects | Quakers | Color | Category Two Defects | Certification Body | Certification Address | Certification Contact | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Colombia | Finca El Paraiso | CQU2022015 | Finca El Paraiso | NaN | Coffee Quality Union | 1930 | Piendamo,Cauca | Diego Samuel Bermudez | ... | 0 | 89.33 | 11.8 | 0 | 0 | green | 3 | Japan Coffee Exchange | 〒413-0002 静岡県熱海市伊豆山1173−58 1173-58 Izusan, Ata... | 松澤 宏樹 Koju Matsuzawa - +81(0)9085642901 |
1 | 1 | Taiwan | Royal Bean Geisha Estate | The 2022 Pacific Rim Coffee Summit,T037 | Royal Bean Geisha Estate | NaN | Taiwan Coffee Laboratory | 1200 | Chiayi | 曾福森 | ... | 0 | 87.58 | 10.5 | 0 | 0 | blue-green | 0 | Taiwan Coffee Laboratory 台灣咖啡研究室 | QAHWAH CO., LTD 4F, No. 225, Sec. 3, Beixin Rd... | Lin, Jen-An Neil 林仁安 - 886-289116612 |
2 | 2 | Laos | OKLAO coffee farms | The 2022 Pacific Rim Coffee Summit,LA01 | oklao coffee processing plant | NaN | Taiwan Coffee Laboratory | 1300 | Laos Borofen Plateau | WU TAO CHI | ... | 0 | 87.42 | 10.4 | 0 | 0 | yellowish | 2 | Taiwan Coffee Laboratory 台灣咖啡研究室 | QAHWAH CO., LTD 4F, No. 225, Sec. 3, Beixin Rd... | Lin, Jen-An Neil 林仁安 - 886-289116612 |
3 rows × 37 columns
coffee.shape
(207, 37)
coffee.info()
<class 'pandas.core.frame.DataFrame'> Index: 207 entries, 0 to 206 Data columns (total 37 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 207 non-null int64 1 Country of Origin 207 non-null object 2 Farm Name 205 non-null object 3 Lot Number 206 non-null object 4 Mill 204 non-null object 5 ICO Number 75 non-null object 6 Company 207 non-null object 7 Altitude 207 non-null int64 8 Region 205 non-null object 9 Producer 206 non-null object 10 Number of Bags 207 non-null int64 11 Bag Weight 207 non-null object 12 In-Country Partner 207 non-null object 13 Owner 207 non-null object 14 Variety 201 non-null object 15 Status 207 non-null object 16 Processing Method 202 non-null object 17 Aroma 207 non-null float64 18 Flavor 207 non-null float64 19 Aftertaste 207 non-null float64 20 Acidity 207 non-null float64 21 Body 207 non-null float64 22 Balance 207 non-null float64 23 Uniformity 207 non-null float64 24 Clean Cup 207 non-null int64 25 Sweetness 207 non-null int64 26 Overall 207 non-null float64 27 Defects 207 non-null int64 28 Total Cup Points 207 non-null float64 29 Moisture Percentage 207 non-null float64 30 Category One Defects 207 non-null int64 31 Quakers 207 non-null int64 32 Color 207 non-null object 33 Category Two Defects 207 non-null int64 34 Certification Body 207 non-null object 35 Certification Address 207 non-null object 36 Certification Contact 207 non-null object dtypes: float64(10), int64(9), object(18) memory usage: 61.5+ KB
for column in coffee.columns:
distinct_values = len(coffee[column].unique())
print(f"{column}: {distinct_values} distinct values")
ID: 207 distinct values Country of Origin: 22 distinct values Farm Name: 173 distinct values Lot Number: 187 distinct values Mill: 163 distinct values ICO Number: 68 distinct values Company: 72 distinct values Altitude: 67 distinct values Region: 121 distinct values Producer: 173 distinct values Number of Bags: 55 distinct values Bag Weight: 39 distinct values In-Country Partner: 21 distinct values Owner: 80 distinct values Variety: 49 distinct values Status: 1 distinct values Processing Method: 11 distinct values Aroma: 19 distinct values Flavor: 19 distinct values Aftertaste: 20 distinct values Acidity: 19 distinct values Body: 17 distinct values Balance: 18 distinct values Uniformity: 3 distinct values Clean Cup: 1 distinct values Sweetness: 1 distinct values Overall: 21 distinct values Defects: 1 distinct values Total Cup Points: 81 distinct values Moisture Percentage: 46 distinct values Category One Defects: 6 distinct values Quakers: 11 distinct values Color: 12 distinct values Category Two Defects: 14 distinct values Certification Body: 21 distinct values Certification Address: 21 distinct values Certification Contact: 21 distinct values
coffee.drop(['ICO Number', 'Lot Number','In-Country Partner', 'Quakers',
'Certification Address', 'Certification Contact', 'Owner', 'Status', 'Overall', 'Certification Body', 'Certification Address', 'Certification Contact'], axis=1).head(3)
ID | Country of Origin | Farm Name | Mill | Company | Altitude | Region | Producer | Number of Bags | Bag Weight | ... | Balance | Uniformity | Clean Cup | Sweetness | Defects | Total Cup Points | Moisture Percentage | Category One Defects | Color | Category Two Defects | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Colombia | Finca El Paraiso | Finca El Paraiso | Coffee Quality Union | 1930 | Piendamo,Cauca | Diego Samuel Bermudez | 1 | 35 | ... | 8.42 | 10.0 | 10 | 10 | 0 | 89.33 | 11.8 | 0 | green | 3 |
1 | 1 | Taiwan | Royal Bean Geisha Estate | Royal Bean Geisha Estate | Taiwan Coffee Laboratory | 1200 | Chiayi | 曾福森 | 1 | 80 | ... | 8.25 | 10.0 | 10 | 10 | 0 | 87.58 | 10.5 | 0 | blue-green | 0 |
2 | 2 | Laos | OKLAO coffee farms | oklao coffee processing plant | Taiwan Coffee Laboratory | 1300 | Laos Borofen Plateau | WU TAO CHI | 19 | 25 | ... | 8.17 | 10.0 | 10 | 10 | 0 | 87.42 | 10.4 | 0 | yellowish | 2 |
3 rows × 27 columns
coffee.groupby('Country of Origin').agg({'Aroma': 'mean',
'Aftertaste': 'mean',
'Acidity': 'mean',
'Body': 'mean',
'Balance': 'mean',
'Sweetness': 'mean',
'Clean Cup': 'mean',
'Uniformity': 'mean',
}).sort_values(by='Aroma')
Aroma | Aftertaste | Acidity | Body | Balance | Sweetness | Clean Cup | Uniformity | |
---|---|---|---|---|---|---|---|---|
Country of Origin | ||||||||
El Salvador | 7.332857 | 7.261429 | 7.525714 | 7.475714 | 7.334286 | 10.0 | 10.0 | 10.000000 |
Thailand | 7.466667 | 7.526667 | 7.521667 | 7.603333 | 7.598333 | 10.0 | 10.0 | 10.000000 |
Brazil | 7.475000 | 7.359000 | 7.416000 | 7.426000 | 7.342000 | 10.0 | 10.0 | 10.000000 |
Nicaragua | 7.475714 | 7.240000 | 7.368571 | 7.500000 | 7.512857 | 10.0 | 10.0 | 10.000000 |
Costa Rica | 7.592500 | 7.532500 | 7.730000 | 7.708750 | 7.633750 | 10.0 | 10.0 | 10.000000 |
Vietnam | 7.645000 | 7.477500 | 7.522500 | 7.542500 | 7.605000 | 10.0 | 10.0 | 10.000000 |
Mexico | 7.667500 | 7.437500 | 7.522500 | 7.645000 | 7.457500 | 10.0 | 10.0 | 10.000000 |
Peru | 7.667500 | 7.415000 | 7.520000 | 7.352500 | 7.415000 | 10.0 | 10.0 | 10.000000 |
Guatemala | 7.697619 | 7.686667 | 7.758095 | 7.766190 | 7.726667 | 10.0 | 10.0 | 10.000000 |
Honduras | 7.706154 | 7.536923 | 7.607692 | 7.607692 | 7.570769 | 10.0 | 10.0 | 10.000000 |
Indonesia | 7.723333 | 7.610000 | 7.776667 | 7.640000 | 7.610000 | 10.0 | 10.0 | 10.000000 |
United States (Hawaii) | 7.734000 | 7.632000 | 7.666000 | 7.516000 | 7.700000 | 10.0 | 10.0 | 10.000000 |
Laos | 7.750000 | 7.496667 | 7.613333 | 7.526667 | 7.583333 | 10.0 | 10.0 | 10.000000 |
Colombia | 7.780526 | 7.618421 | 7.736842 | 7.653684 | 7.650526 | 10.0 | 10.0 | 10.000000 |
Panama | 7.790000 | 7.460000 | 7.795000 | 7.625000 | 7.585000 | 10.0 | 10.0 | 10.000000 |
Myanmar | 7.830000 | 7.670000 | 7.580000 | 7.580000 | 7.580000 | 10.0 | 10.0 | 10.000000 |
Uganda | 7.833333 | 7.610000 | 7.583333 | 7.666667 | 7.666667 | 10.0 | 10.0 | 10.000000 |
Kenya | 7.835000 | 7.625000 | 7.540000 | 7.585000 | 7.750000 | 10.0 | 10.0 | 10.000000 |
Taiwan | 7.842787 | 7.712131 | 7.765246 | 7.661311 | 7.730984 | 10.0 | 10.0 | 9.967213 |
Ethiopia | 7.864545 | 7.757273 | 8.016364 | 7.819091 | 7.774545 | 10.0 | 10.0 | 10.000000 |
Tanzania, United Republic Of | 7.903333 | 7.750000 | 7.863333 | 7.848333 | 7.776667 | 10.0 | 10.0 | 10.000000 |
Madagascar | 7.920000 | 7.580000 | 7.750000 | 7.580000 | 7.670000 | 10.0 | 10.0 | 10.000000 |
coffee.groupby('Country of Origin')['Country of Origin'].count().sort_values(ascending=False)
Country of Origin Taiwan 61 Guatemala 21 Colombia 19 Honduras 13 Thailand 12 Ethiopia 11 Brazil 10 Costa Rica 8 Nicaragua 7 El Salvador 7 Tanzania, United Republic Of 6 United States (Hawaii) 5 Peru 4 Mexico 4 Vietnam 4 Laos 3 Uganda 3 Indonesia 3 Panama 2 Kenya 2 Myanmar 1 Madagascar 1 Name: Country of Origin, dtype: int64
coffee.groupby(['Country of Origin', 'Company'])['Flavor'].mean().sort_values(ascending=False)
Country of Origin Company Laos Taiwan Coffee Laboratory 8.420 Ethiopia Marubeni corp., 8.250 Kerchanshe Trading PLC 8.170 Taiwan Dongbi coffee 8.170 Tanzania, United Republic Of DORMAN (T) LIMITED 8.085 ... Laos Marubeni Corporation 7.170 Nicaragua SAJONIA ESTATE COFFEE S.A. 7.170 Brazil marubeni 7.170 El Salvador Aprentium Enterprises LLC 7.110 Brazil Descafeinadores Mexicano SA. de CV 7.080 Name: Flavor, Length: 98, dtype: float64
plt.style.use("fivethirtyeight")
plt.figure(figsize=(20,7))
plt.xticks(rotation=90)
df1 = coffee.sort_values(by='Total Cup Points')
plt.bar(x=df1['Country of Origin'],
height=df1['Total Cup Points'].sort_values(ascending=False), width=0.5
)
plt.show()
df2 = coffee[['Country of Origin', 'Company','Number of Bags', 'Bag Weight']]
df2['Total_kg'] = coffee['Number of Bags'] * (coffee['Bag Weight'].str.replace('kg', ' ').astype('int64'))
df2
C:\Users\pirag\AppData\Local\Temp\ipykernel_8188\1904100434.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df2['Total_kg'] = coffee['Number of Bags'] * (coffee['Bag Weight'].str.replace('kg', ' ').astype('int64'))
Country of Origin | Company | Number of Bags | Bag Weight | Total_kg | |
---|---|---|---|---|---|
0 | Colombia | Coffee Quality Union | 1 | 35 kg | 35 |
1 | Taiwan | Taiwan Coffee Laboratory | 1 | 80 kg | 80 |
2 | Laos | Taiwan Coffee Laboratory | 19 | 25 kg | 475 |
3 | Costa Rica | Coffee Quality Union | 1 | 22 kg | 22 |
4 | Colombia | Coffee Quality Union | 2 | 24 kg | 48 |
... | ... | ... | ... | ... | ... |
202 | Brazil | Ipanema Coffees | 2240 | 60 kg | 134400 |
203 | Nicaragua | Exportadora Atlantic S.A | 300 | 30 kg | 9000 |
204 | Laos | Marubeni Corporation | 343 | 60 kg | 20580 |
205 | El Salvador | Aprentium Enterprises LLC | 1 | 2 kg | 2 |
206 | Brazil | Descafeinadores Mexicano SA. de CV | 600 | 60 kg | 36000 |
207 rows × 5 columns
coffee.groupby('Country of Origin')['Total_kg'].sum().sort_values(ascending=False)
Country of Origin Ethiopia 12389860.0 Brazil 320325.0 Guatemala 245370.0 Honduras 170490.0 Colombia 161512.0 Tanzania, United Republic Of 94200.0 Nicaragua 93630.0 Costa Rica 71014.0 Peru 56940.0 El Salvador 55581.0 Uganda 47400.0 Indonesia 38460.0 Mexico 30087.0 Laos 22615.0 Kenya 18065.0 Panama 12000.0 Taiwan 4317.0 Myanmar 3000.0 Vietnam 2491.0 United States (Hawaii) 1510.0 Thailand 431.0 Madagascar 300.0 Name: Total_kg, dtype: float64
Pirminė hipotezė iš dalies pasitvirtino, nes didžiausius balaus (Total cup points) už visus kriterijus (Aroma, Flavor, Aftertaste, Acidity, Body, Balance, Uniformity, Clean Cup, Sweetness) gavo pupelės iš Brazilijos. Tačiau paaiškėjo, kad daugiausia kavos gamintojų buvo iš Taivano, o daugiausia kavos pupelių gamintojai rinkosi iš Etiopijos.